Quantitative Structure-Activity Relationships: Linear Regression Modelling and Validation Strategies by Example QSARs-LRM Modelling and Validation Strategies by Example
نویسندگان
چکیده
Quantitative structure-activity relationships are mathematical models constructed based on the hypothesis that structure of chemical compounds is related to their biological activity. A linear regression model is often used to estimate and predict the nature of the relationships between a measured activity and some measure or calculated descriptors. Linear regression helps to answer main three questions: does the biological activity depend on structure information; if so, the nature of the relationship is linear; and if yes, how good is the model in prediction of the biological activity of new compounds. This manuscript presents the steps on linear regression analysis moving from theoretical knowledge to an example conducted on sets of endocrine disrupting chemicals. Keywords-robust regression; validation; diagnostic; predictive power; quantitative structureactivity relationships (QSARs) I. BRIEF HISTORY OF LINEAR REGRESSION Linear regression analysis is used in life science researches to describe the strength of the association between outcome and factors of interest, to adjust data for covariates or co-founders, to identify predictors (factors that affect the outcome) and/or to predict the outcome [1]. It could be considered that Sir Francis Galton provided the initial inspiration that led to correlation and regression. The fundamentals of correlation were discussed by Bravais [2] who presented the correlation of two and three variables. Galton improved notation as "Galton function" of correlation coefficient (r); this function could be found in Bravais' work but not as a single symbol. Edgeworth indicated in 1892 how to extend the Bravais' method to higher degree of correlation [3] and expressed his results in terms of "Galton's function". Galton used regression to understand heredity and suggested a slope of 0.33 that showed the relationships between extremely large or small mother peas seed and their less extreme daughter seeds [4,5]. Galton seems to build the regression analysis based on the work of Adolphe Quetelet who is known to be the first scientists that applied in a systematically way a statistical methods to human [6]. Furthermore, Quetelet showed normal distributions in diverse aggregated data [6]. Galton was able to fit all data in a single line and he abbreviated the slope of this line as "r" [7], later this symbol being use to stand for correlation coefficient [8]. Pearson demonstrated in 1896 that optimum values of slope and correlation coefficient could be calculated from the product-moment [8]. On the same time, George Yule refined regression analysis [9], [10], [11], solving his regression problem by minimizing the sum of squares error [9,10], method that was presented for the first time by Legendre in 1805 [12]. II. LINEAR REGRESSION ON QSAR ANALYSIS Quantitative structure-activity relationships (QSARs) are mathematical models linking chemical structure and pharmacological activity/property in a quantitative manner for a series of compounds [13]. The approaches are based on the assumption that the structure of chemical compounds (such as geometric, topologic, steric, electronic properties, etc.) contains features responsible for its physical, chemical and/or biological properties [14]. This assumption could be summarized as "similar compounds have similar properties" [15]. The two main fields were linear regression analysis found its applicability are drug discovery [16], [17] and toxicology prediction [18], [19]. In both of these fields, the linear regression is used mainly to predict not to estimate (the model is used to quickly determine the activity/property of new/un-investigated compounds) [20]. The linear regression is used in QSAR analysis to linearly link the activity/property of chemical compounds (measured or observed value outcome variable abbreviated as Y) and some values translated from the structure of the compounds and generally called descriptors (assumed error non-affected independent variables abbreviated as X(s)). The multiple linear regression (MLR) expression is presented in Eq(1):
منابع مشابه
QSRR Study of Organic Dyes by Multiple Linear Regression Method Based on Genetic Algorithm (GA–MLR
Quantitative structure-retention relationships (QSRRs) are used to correlate paper chromatographic retention factors of disperse dyes with theoretical molecular descriptors. A data set of 23 compounds with known RF values was used. The genetic algorithm-multiple linear regression analysis (GA-MLR) with three selected theoretical descriptors was obtained. The stability and predictability of the ...
متن کاملQuantitative structure-activity relationship (QSAR) study of CCR2b receptor inhibitors using SW-MLR and GA-MLR approaches
In this paper, the quantitative structure activity-relationship (QSAR) of the CCR2b receptor inhibitors was scrutinized. Firstly, the molecular descriptors were calculated using the Dragon package. Then, the stepwise multiple linear regressions (SW-MLR) and the genetic algorithm multiple linear regressions (GA-MLR) variable selection methods were subsequently employed to select and implement th...
متن کاملThe current status and future applicability of quantitative structure-activity relationships (QSARs) in predicting toxicity.
The current status of quantitative structure-activity relationships (QSARs) in predicting toxicity is assessed. Widespread use of these methods to predict toxicity from chemical structure is possible, both by industry to develop new compounds, and also by regulatory agencies. The current use of QSARs is restricted by the lack of suitable toxicity data available for modelling, the suitability of...
متن کاملSkew Laplace Finite Mixture Modelling
‎This paper presents a new mixture model via considering the univariate skew Laplace distribution‎. ‎The new model can handle both heavy tails and skewness and is multimodal‎. ‎Describing some properties of the proposed model‎, ‎we present a feasible EM algorithm for iteratively‎ ‎computing maximum likelihood estimates‎. ‎We also derive the observ...
متن کاملNonparametric Regression Applied to Quantitative Structure-Activity Relationships
Several nonparametric regressors have been applied to modeling quantitative structure-activity relationship (QSAR) data. The simplest regressor, the Nadaraya-Watson, was assessed in a genuine multivariate setting. Other regressors, the local linear and the shifted Nadaraya-Watson, were implemented within additive models--a computationally more expedient approach, better suited for low-density d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014